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Abstract 

Parameter inference is a fundamental problem in data-driven modeling. Given ob¬ 
served data that is believed to be a realization of some parameterized model, the aim is 
to find parameter values that are able to explain the observed data. In many situations, 
the dominant sources of uncertainty must be included into the model, for making reliable 
predictions. This naturally leads to stochastic models. Stochastic models render param¬ 
eter inference much harder, as the aim then is to find a distribution of likely parameter 
values. In Bayesian statistics, which is a consistent framework for data-driven learning, 
this so-called posterior distribution can be used to make probabilistic predictions. We 
propose a novel, exact and very efficient approach for generating posterior parameter dis¬ 
tributions, for stochastic differential equation models calibrated to measured time-series. 
The algorithm is inspired by re-interpreting the posterior distribution as a statistical me¬ 
chanics partition function of an object akin to a polymer, where the measurements are 
mapped on heavier beads compared to those of the simulated data. To arrive at distribu¬ 
tion samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple 
time-scale integration. A separation of time scales naturally arises if either the number of 
measurement points or the number of simulation points becomes large. Furthermore, at 
least for ID problems, we can decouple the harmonic modes between measurement points 
and solve the fastest part of their dynamics analytically. Our approach is applicable to a 
wide range of inference problems and is highly parallelizable. 


1 Introduction 

Modeling a dynamical process starts with a basic model that is usually obtained from a 
more or less deep insight into the nature of the process. The next step is the determination 
of the parameters of the model, based on observed data, which is generally a highly 
nontrivial task, in particular when complex behavior of such systems needs to be predicted, 
or when the measurements are noisy. A minimal example is a perceptron |221124) , the 
basic element of a neuronal network, that predicts the double cosine value associated with 
the input of the corresponding sine function plus the sine’s value at a fixed earlier time. 

While this task can easily be achieved for clean data using gradient descent learning, for 
noisy input data, this is largely impossible, as noise cannot be learned. The result is a 
distribution of potential parameter values (Fig. 0). 
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Figure 1: General problem setting: Parameter estimation (synaptic weights wi,W 2 ) of a per- 
ceptron, from a) noiseless, b) noisy data (noise sampled from a flat distribution over the 
interval [—0.2,0.2]). Whereas for noiseless data the estimates perfectly converge, for noisy 
data the estimates the system attempts to also include the noise, leading to a nontrivial dis¬ 
tribution of the parameter estimates, and rendering the extraction of the optimal parameters 
a nontrivial task. Open circles: location of the optimal parameters in the noiseless case. 


In Bayesian statistics, knowledge about parameters is expressed by probability distri¬ 
butions and learning is implemented as an update rule on these distributions (see, e.g. 
[3]). If a constant noise term is added to the output of a deterministic model, such as in 
the perceptron example above, Bayesian inference is straightforward. If noise enters the 
formulation of the model equations, however, Bayesian inference all of a sudden becomes 
computationally very expensive. 

In our paper, we demonstrate the calibration of ordinary ID stochastic differential 
equation (SDE) models based on noisy time series, and the quantification of the resulting 
parametric uncertainty. The generic approach that we use is exemplified by a simple SDE 
model from hydrology. 

Problems of this kind are commonly solved by Monte Carlo (MC) methods that are 
based on simulating model realizations and comparing them to the data. Popular meth¬ 
ods are particle filters laiis], Metropolis-within-Gibbs algorithms [20] or Approxi¬ 
mate Bayes Computations [I7llll|30l|29]. A major problem with these simulation-based 
methods is, however, their inefficiency in the presence of many data points or high di¬ 
mensions. One solution is to map the output space to a smaller dimensional space of 
summary statistics, and accept/reject proposed model parameters depending on how well 
associated model runs conform with the data in terms of these summary statistics m- 
However, how to choose the summary statistics to achieve a significant representation of 
the posterior parameter distribution is a largely unsolved problem. 

These difficulties can be remedied with a reinterpretation of the Bayesian posterior 
distribution as the partition function of a statistical mechanics system and by simulating 
the dynamics of the latter. After discretizing the time of the original problem, we are 
led to a problem akin to the statistical mechanics of a polymer with harmonic bonds 
in an exterior potential [5]. In this framework, the measurements are interpreted as an 
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additional exterior potential that acts only on the polymer’s ‘measurement beads’ and 
confines their dynamics within the measurement uncertainty. The model parameters are 
interpreted as additional degrees of freedom coupling to all the beads of the polymer. To 
simulate the dynamics of this system, we apply the Hamiltonian Monte Carlo (HMC) 
algorithm [5], which combines Molecular Dynamics mm with the Metropolis algorithm 
Hg. Compared to traditional methods, the use of the Hamiltonian approach achieves 
much higher acceptance rates since data points are already used for the suggestion of new 
parameters, and thus model realizations incompatible with the data are never considered. 
The drawback is that the model equations need to be known and derivatives have to be 
calculated. 

HMC requires two sets of parameters to be tuned: (i) the parameters that define the 
kinetic energy of the statistical mechanics system and (ii) the parameters that define the 
numerical integration scheme of Hamilton’s equation in the molecular dynamics part of 
the HMC algorithm. Efficiency of HMC algorithms can be gained if the kinetic term is 
made dependent on the configuration geometry of the statistical mechanics system. If the 
Riemann geometry of the parameter space of statistical models is taken into account, then 
the simulated search of paths across this manifold samples the target density in an utmost 
efficient way [ 12 ) . Unfortunately, this procedure is both demanding and computationally 
costly, depending strongly on the quality of the space’s extracted geometry. 

Here we explore a computationally simpler approach, which, to our knowledge, has 
never been applied in the context of Bayesian inference before. Depending on the number 
of discretization points needed to approximate the original SDE system and the num¬ 
ber of measurement points, the dynamics of the statistical mechanics system happen on 
very different time scales. This suggests a multiple time scale integration technique for 
the simulation of the statistical mechanics system ^3]. We will show that for ID SDE 
we can always find a parametrization, which decouples the harmonic modes in between 
measurement points from both the measurement points and the model parameters and 
allows for an analytical time-saving solution of the fastest part of the dynamics. Whilst 
for higher dimensional problems it is not always possible to solve part of the dynamics 
analytically, we believe that scale separation alone will render many SDE amenable to a 
full-fledged Bayesian inference with time-series. In fact, scale separation appears to be 
a generic feature if the dynamics of the SDE requires a large number of discretization 
points. 

2 Inference Problem Setting 

Consider, for simplicity and concreteness, a reservoir dynamics S{t) that on the obser¬ 
vation time-scale is linear, with other (inflow and outflow) processes happening at much 
shorter time scales, so that they can be described by white noise. Eurthermore, assume 
that this noise scales linearly with the system state S{t). The model equation is thus 
given by the SDE 



( 1 ) 


where r{t) denotes the time varying rain input, K denotes the retention time, 7 is the 
noise strength, and r]{t) indicates the white noise property, i.e.. 




( 2 ) 


Eq. 0 is to be understood in the Stratonovich sense |25j . 

Properties of a transformed version of Q have been derived, for constant input 
[9l[23l[n]. Here, suffice it to say that, for constant input r{t) = tq, the equilibrium 
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distribution, Peq{S), is an inverse gamma distribution with scale parameter 2KrQ/j and 
shape parameter (2 + 7)/7 (see Sect. 3.1 1 , i.e., 


Peq{S) OC 5'-2(l+7)/7g-2ifro/(7S) _ 


(3) 


The mean of this expression equals the equilibrium solution of the unperturbed system 
(7 = 0) {S)eq = Kro and its variance, for 7 < 2, is given by {{S — (S)eq)"^)eq = ^^^o7/(2~ 
7 ), which, for 7 > 2, is seen to diverge. The power-law decay of the inverse gamma 
distribution is reminiscent of the invariance of Eq. Q under re-scaling of both r(t) and 
S{t). In real-world hydrology, for which Eq. 0 is a model, indeed often fat-tailed error 
distributions are observed m- 

While this equation was motivated by a popular hydrological model HUS], it is by no 
means restricted to this context. By means of the transformation S{t) = Eq. Q 

turns into a model that has been suggested, e.g., as a phenomenological description of the 
dynamics of the neutron density in nuclear reactors [9]. 

In our setting, the input r(t) is a smooth and nowhere vanishing function. We assume 
the observed time-series, j/s, to be the outflow of the reservoir, S{t)/K, observed at times 
0 = < ^2 < ■ ■ • < tn+i = T, with multiplicative independent log-normal errors. 


ln(?/s) = In -hCTEs , s=l,...,n-M, (4) 

where the Cg are uncorrelated standard normal errors. For simplicity, we assume a as well 
as the input r(t) to be known, so that we are left with the task of inferring parameter com¬ 
binations {K, 7 ) that are compatible with the data given by Eq. Q in the Bayesian sense, 
where knowledge about parameters is expressed in terms of probability distributions. 

We start our treatise by assuming that we have prior knowledge about a parameter 
vector, in the form of a probability distribution, /prior measured data, y, 

believed to be a realization of the model. The posterior knowledge, combining prior 
knowledge with the one acquired from data, is calculated by means of equation 


/post(^|y) 


/prior(^)-^(y|^) 

Tf~Jr)lAyWW' 


(5) 


where L(y|@) is the probability distribution for model outputs given model parameters, 
evaluated at the measured data (the infamous likelihood function). 

Before we set out to derive from Eqs. 0 . 0 and Q the likelihood function, we 
express the parameters and state variables by dimensionless quantities. Due to scale- 
invariance of the noise term, 7 is already dimensionless. State variable S(t) and parameter 
K are replaced by the dimensionless quantities q{t) and (3, respectively, which are defined 
by the transformations j3 = sjT^jK and S{t) = {T^r{t )/In these new variables 
and parameters, Eq. 0 becomes the nonlinear SDE with constant noise 

Qit) = - ^p{t) + , ( 6 ) 


with p{t) = {T/f3){d/dt) [In (r(t))] -H (2 -h ^)l3/{2-f). 

The probability P{qi,T\qQ,0) of finding the system in a state <71 at time t = T if it 
was in an initial state qo at time t = 0 , is expressed as a path-integral as 

P(gi, Tjgo, 0) = ^ / e-^[«’«l 5 (( 7 (T) - q,)d{q{0) - q^)Vq , (7) 


where the integral extends over all paths g : [ 0 , T] —>■ K and where the path-measure Vq 
is formally written as the infinite product Pq = 3^^® action is a functional on 
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the space of paths and reads [14] 


5[g,g] 



1 

2 


Tq{t) + p{t) 


7 


2 


27 


( 8 ) 


This action includes the Jacobian that is introduced when changing coordinates from pit) 
to q{t). 

If we denote the parameter vector 9 = (/3,7)^ and assume a flat prior, the posterior 
([^ is, as a function of 6, proportional to the likelihood function 


/post(^|y) 



1 

2 


n+1 

E 


S = 1 


iMys/rjts)) - I3q{t,)f 

rr‘^ 


5[g,9] 


Vq. 


(9) 


Whereas the first term in the exponent describes the log-probability distribution of model 
outputs, for given model parameters, inputs and a system realization q(ts), the second 
term is the log-probability of the associated system realization q{t). 

When applying this approach now to real-world problems, instead of undertaking a 
prohibitive numerical computation of the path integral, we apply HMC to sample parame¬ 
ter vectors from a joint distribution of system realizations and model parameters given by 
an appropriate discretization of the action of the path-integral. By doing so, we observe 
that we obtain distinct regimes of time scales in the Hamiltonian that can be separated 
(see Sect. 3.2). This time scale separation simplifies and boosts our algorithm; in many 
cases it even permits parts of the required integrations to be done analytically. 


3 Algorithm 

3.1 Inference algorithm 

For the inference algorithm, it is necessary to rewrite action ([^ with the help of the 
time-dependent potential U{q,t) = -I- qp{t) as 


Sb. ?l = f f + i (m*) - } 

+ Uiq{T),T)-U{q{0),0) 


= i -Tq{t)m 


27 


+ + q{T)p{T) - - g(0)p(0). (10) 

7 7 

With the action in this form, we easily derive the equilibrium distribution, for constant 
input r{t) = ro, by plugging 0 and ([To| into the detailed balance condition 


P{qiti\qoto)Peq{qo) = P{qoti\qito)Peq{qi) 


( 11 ) 


and using the transformation q{t) —>• q{—t). We get, since p[t) = 0, 

Peg(<?) OC . 

Back transformation to the original variables leads to Eq. (|^. 

For efflciently drawing parameter samples from ([^ , we interpret the latter as the par¬ 
tition function of a ID statistical mechanics system and simulate its dynamics employing 
the HMC algorithm [8] . The model parameters 9 are interpreted as additional dynamical 
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degrees of freedom coupling to the system variables q{t). Each degree of freedom, q{t) 
and 6, is paired with a conjugate variable, p{t) and tt respectively, so that the system is 
defined by the Hamiltonian 


where 


nnMc{q,0;P,‘^) = K{p,Tr) + V{q,0 ), 




2 ^2 


2mo 


( 12 ) 


(13) 


and V{q, 6) is the negative logarithm of the kernel of The posterior © can then be 
expressed by the phase space path integral 


/post(^|y)« / . 


(14) 


and molecular 


The HMC method, as a combination of the Metropolis algorithm _ _ _ 

dynamics methods Enm, iterates the following steps: 

1. Momenta p(t) and tt are sampled from the Gaussian distributions defined by Eq. (13). 


2. The system is then allowed to evolve in (g, 0;p,7r)-phase space for an arbitrary 
time interval r according to a volume-preserving and time-reversible solution of a 
discretized set of Hamilton equations. 

3. The discretization error on the energy preservation due to the previous step is cor¬ 
rected by a Metropolis acceptance/rejection step. 

The last step is the standard Metropolis algorithm, while the first two steps permit ar¬ 
bitrarily large jumps in phase space, while maintaining an arbitrarily large acceptance 
rate. Each new phase space configuration is associated with a combination of model pa¬ 
rameters 6, which is compatible with the data in the Bayesian sense. Thus, omitting a 
possible burn-in, the parameter marginal of the simulated Markov chain of configurations 
represents a sample of the posterior probability distribution. 


In order to simulate the dynamics of the Hamiltonian (12), we first need to discretize 


the path-integral (14). Let us assume that the measurement time points {?/s}s=i 
of the time series (^are equidistantly distributed on the time interval [0,T], with G = 0 
and = T. Each interval between two consecutive data points is further partitioned 
into j bins, such that we have a total of nj -I- 1 = iV >> 1 discretization points. The 


path-integral (14) is then approximated by an ordinary integral, with the approximate 
path-measure VpVq « J([j dpidqi. The discretized versions of K{p,Tr) and V{q,0) are now 
given by 


N 2 


At A r 1. 


2 2 


2ma ’ 


(15) 


i—2 


7 


_ _e _ Tq,p, 


+ ie-«~ + + "f ('"fa-M-w) - , 

7 7 2 ( 7 ^ 

' s—l 

(16) 

with qi = {qi - qi_i)/At, p* = T\n{r{ti)/r{U-i))/{/3At) -h (2 -F 'y)/3/{2j) and p^ = {pi - 
Pi-i)/At, and where terms of order were neglected. Note that we did not apply 

the mid-point discretization that is associated with the Stratonovich convention. In Eq. 
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(15) this leads to a different dynamics that does not, however, alter the posterior we are 
interested in, and in Eq. (16) we produce errors of the order that we neglect. 

Physically, the discretized Hamiltonian can be identified with a classical polymer chain 
of N beads with harmonic bonds between neighboring beads in an external field [S] . The 
latter consists of two parts, a field that results from the measurements and is felt by the 
measurement beads only (last term on the r.h.s. of Eq. ( |16[ )), and a field that results from 
the dynamics of the original Eq. ([^ and is felt by all the beads. The masses Wj and rUa 
are tunable parameters of the algorithm. Since measurement beads are constrained more 
than intermediate beads, we will assign larger masses to the former. Fig. © shows a 
typical realization of the dynamics of the polymer. Measurement beads only move within 
the measurement uncertainty, whilst the intermediate beads explore much larger regions 
of phase space. 



Figure 2: Simulated polymer chain dynamics, with n + 1 = 11 data points (large circles) 
and j — 1 = 9 intermediate beads (small circles). For other parameters see Sectionsand 
Bottom: initial state, where intermediate beads are on a linear interpolation between data 
points. Top: polymer after Nmc = 1000 iterations of the propagation algorithm (dotted line: 
initial configuration). Clearly the new configuration is mostly determined by the dynamics of 
the light-mass intermediate beads, while the heavy-mass data points move to a much lesser 
extent. 


We have thus reduced the original Bayesian inference problem to simulating the dy¬ 
namics of a linear polymer (cf. Fig. (©)• Each state of this hctitious molecule corresponds 
to a well-defined conhguration in the original phase space, characterized by a set of system 
variables {9i}i=i n ^ parameter vector It is now essential to note that potential 
(16) contains terms of distinct scaling in the potentially large numbers JV and n, that refer 
to dynamics on distinct time-scales. In particular, for large TV, V(q, 6 ) is dominated by its 
harmonic part, and to resolve its dynamics, brute force numerical integration of Hamil¬ 
ton’s equations in step 2 of the HMC algorithm would require a very small discretization 
time-step. 


3.2 Time scale separation 

Whereas an interesting approximate approach would be to employ a partial averaging 
of the fast Fourier modes [7], we will use an exact multiple time scale integration based 
on Trotter’s formula [33| . For this, we introduce so-called staging variables, and diago¬ 
nalize the harmonic part between the measurement points. To this end, we rewrite the 
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discretized harmonic part of the action as 


N 


T , T r (q(s_i)j+i qsj+i) 




S = 1 


jAt 

3 

^E 

k=2 


k 


{k — l)At 


{Q{s-i)j+k (Z(s-i)j+fc) J (1'^) 


with = {{k-l)q(^s_i'jj+k+i+q{s-i)j+i)/k. The boundary beads, corresponding to 

the original measurement points, are not transformed, Ugj+i = 9sj+ii s = 0,... ,n, while, 
for the intermediate staging beads, we apply the coordinate transformations Usj^k = 
qsj+k — q*sj+k^ s = 0,..., n — 1, fc = 2,..., j. Their inverse transformations are given by 


Qsj+l — Usj + l , 


i+i 


k-1 


Qsj-\-k 


E t% — 

73 


l^k 


l-l 




j -k + l 


-Usj + l 


which can be captured by the recursive relation 


_ k-l 1 

qsj+k — Usj+k T ~ qsj+k+1 “t“ —Usj-^-i . 


(18) 

(19) 


( 20 ) 


The momenta are not transformed, which means we are using a non-canonical transfor¬ 
mation. This alters only the dynamics of the system, not the posterior we are interested 
in. 

We now split the Hamiltonian Humc into components according to their scaling be¬ 
havior in n and iV, and write 


^HMC — '^Ln + 'kin + kii , 


where 




At 


Tk 


s=l fe=2 

l!^' {At 


-l)j+k-^ ^^(^_ l)“(s-lb + fcj . 


^ J 2 I — l)j + l) P'^{s — l)j + l) 


+ 9Ta 7 E(“(«-I)i+1 “ 'U'sj+iY 


2jAt 


At 


N 




2ma T . 

oc — 1 1—2 


=-/39. 


fl2 

- - Tq,p, 

27 


+ le-/39« 


qiP2 ■ 


( 21 ) 


( 22 ) 

(23) 


(24) 


Here, we have introduced two masses, M and m!, for the boundary and staging beads, 
respectively. The scaling of these Hamiltonians can be derived from basic properties of 
discretized SDEs, from which we conclude that iti ~ VAt. Furthermore, in agreement 
with the equipartition law we find that pi ~ I/'/At. Accordingly, we find that the 
harmonic part (22), for the staging beads, scales linearly with N. The terms of Eq. (23), 
including both the harmonic part for the boundary beads and the measurement term, 
scale linearly with n. Finally, Eq. (24) neither scales with n nor N. Thanks to the staging 
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variables, 'Hn and 'Hn have become fully decoupled. We use Trotter’s formula [32] in 
order to design a reversible molecular dynamics integrator that takes the presence of the 
different time scales into account. For an appropriate partition of the Hamiltonian, three 
distinct regimes can be distinguished: 

(i) Hn Hn » Hi, 

(n) Hn » Hn ~ Hi, 

(in) Hn >> Hn >> Hi- 

In the following we restrict ourselves to regime (ii), where the number of measurements 
n is assumed to be not too large and/or the measurement error a to be not too small (the 
generalization of the method to the other regimes would, however, be straightforward). 
In this regime we may simply separate the harmonic part of the action for the staging 
beads from the rest and write 

Hhmc=Hn+H'. (25) 

For obtaining reversible integrators, we define the Liouville operators iLN = {■, Hn}, 
iL' = {■, H'}, where {•, •} denote the Poisson brackets that apply to functions on the 
phase space. Trotter’s formula |31| allows us to write the Hamiltonian propagator as 

^i{Ljv+L')T _ ^giLjv(Ar/2)giL'ATgiLjv(Ar/2)^P _|_ j (26) 


for T = PAt. Here, the outer propagator exp[iLAr(AT/2)] reflects much faster dynamics 
than the inner one. However, thanks to our re-parametrization, it describes the dynamics 
of uncoupled harmonic oscillators, which we can readily solve. Masses and frequencies of 


the oscillators are derived from (221 as 


m = m' jAt, ujk = 


Nk 


[k — l)m 


(27) 


The outer propagator becomes 

M(s_i)j+fe(AT/2) = M(s_i)j-+fc(0) cos(wfcAr/2) + sin(a;fcAT/2), (28) 

TTtUJ]" 

P(s_i)j+fe(AT/2) = p(5_i)j+fc(0) cos(wfcAr/2) - mu)uU{s-i)j+k{^) sin(a;fcAr/2), (29) 


for s = 1,... , n and k = 2,..., j. For the inner propagator, we employ the time-reversible 
and volume preserving velocity Verlet algorithm |26j , which leads for the boundary beads 
to 

+ -^^F’(s_i)j_|-i[u(O),0(O)], (30) 

= P(s-i)i+i(0) + “;^(^(s-i)j-i-i[u(O),0(O)] -I-F(s_i)j+i[u(AT),0(Ar)]), 

(31) 


with s = 1,... ,n + 1 and where Fi[u,0] denotes the partial derivative of H'[vl,9] w.r.t. 
Ui- Analogous equations emerge for the model parameters 6 and their momenta tt, by 
exchanging u by d and p by tt, along with the corresponding masses, respectively. For 
the staging beads only the momenta need to be updated (because the associated kinetic 
term is not part of H' , but of Hn)- Thus, with s = 1,..., n and k = 2,..., j as before. 


P{s — l)j + ki^'^) P(s —(0) 

+ ^(^(^-i)j+fc[u(0),6'(0)] -f F’(,_i)j+fc[u(AT),6>(AT)]). (32) 
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The propagators (28) through (32) are applied sequentially P times to calculate the 
system evolution over time r. The proposed configuration, p',7r') is accepted with 

Metropolis probability min ^1, e^HMc(u.®;p>'^)-WHMc(u ;p . xhe next iteration then 

starts with sampling a new momentum vector (p,7r). 

The analytical solution (28) and (29) is one main boosting part of our algorithm. To 
find such a solution, it is important to arrive at model equations of the form (|^, where the 
noise term neither depends on the state variables, nor on the parameters to be inferred. 
In a one dimensional model this can always be achieved through re-parametrization (see, 
e.g., chapter 5 in m)- In higher dimensions, this will not always be possible. But even in 
such cases, we will be able to boost our algorithm through assigning smaller time intervals 
At to the fast dynamics and larger ones to the slow dynamics. 


4 Results 

For our toy system, we have considered a simple sinusoidal input r{t) = sin^ (O.Olt) +0.1. 
A system realization was first obtained from Eq. ([^ using arbitrary units 

of time) and 7true “ Such system realization was then used to generate a synthetic 
time series of observed data according to Eq. Q- The error a was set to 0.1. The input 
signal, the ’’true” system realization and the corresponding data time series are shown in 
Fig. (§. A set of 200 system realizations sampled from the integrand of Eq. ([^ , based on 



Figure 3: System realization (solid line) with synthetic observations (filled circles, with error 
bars indicating the assumed measurement uncertainty). The system response closely follows 
the oscillations of the sinusoidal input (dashed) in a time-delayed manner. Parameters for 
this figure and all following figures: i^true = 50 and 7true = 0.2. 

u + 1 = 11 measurement points and N = 301 discretization points, is shown in Fig. ([^, 
together with the generated synthetic data. These samples were generated with a HMC 
algorithm, where the masses were set (in arbitrary units) to M = 720 for the measurement 
beads, to m! = 130 for the intermediate discretization beads, and to = 150 for both 
the dimensionless parameters /? and 7 . The different dynamics of the heavy measurement 
beads and the light discretization beads can be appreciated in Fig. ([^. 

The Markov chains for parameters K and 7 , obtained after Nmc = 50000 iterations 
of the HMC algorithm, are shown in Figs. ([^ and ([^, respectively. 

The efficiency of the algorithm can be appreciated best by inspecting the system 
evolution in the phase space AT — 7 (Fig. 0 )- The starting point of the algorithm was set 
to a linear interpolation of the data points, together with values for the model parameters 
that were deliberately chosen far off the truth (open circle in Fig. [^. Nevertheless, 
the very first step of the algorithm already takes the system to the vicinity of the true 
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Figure 4: Simulated system realizations associated with synthetic data, based on n + 1 = 11 
measurement points and = 301 discretization points. 


200 

K 

0 

0 Nmc 50000 

Figure 5: Markov chain evolution of the inferred parameter K. 




Figure 6 : Markov chain evolution of the inferred parameter 7 . 


parameter values, where most of its dynamics then occurs. Few excursions lead far away 
from the true parameter values. These explore the heavy tails of the posterior parameter 
distribution. 

The Markov chains (Figs. ([^ and ([^) determine the probability density functions 
(PDF) for K and 7 , respectively. The results obtained by using the built-in kernel density 
estimator provided by Mathematica (version 10) are exhibited in Fig. Q; they are fully 
compatible with the true, to be inferred, parameter values AT^rue O'true- 
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Figure 7: System dynamics in the phase space iF — 7 . The circle represents the initial state, 
while the square corresponds to the true parameter values used to generate the data. 



Figure 8 : Probability density functions for the inferred parameters, (a) K and (b) 7 . The 
true values used to generate the data are represented by the dashed vertical lines. 


5 Implementation 


The algorithm was implemented in CH—h (version C++11) using the open source Adept 
library (version 1 . 1 ; m), which provides a powerful tool for fast reverse-mode automated 
differentiation (AD). Our algorithm benefits greatly from the use of AD. This gives us 
the possibility to modify Eq. Q and therefore the action (10), while leaving the imple¬ 
mentation of the algorithm unaltered. This makes our program extremely flexible and 
suitable for a much broader range of applications than the simple exemplary SDE model 
described here. 

The simulations were run on both serial and parallel implementations of the algorithm 
on a 64-bit Linux system equipped with two 12-core 2.7 GHz processors (Intel Xeon E5- 
2697v2) and 64 GB of memory clocked at 1866 MHz. We used n -|- 1 = 11 measurement 
points and N = 101, 201,..., 501 discretization points. In the Hamiltonian propaga¬ 
tor (26), we set At = 0.25 and P = 3, with a constant total observation time T = 833 
(arbitrary units of time). The initial values of the parameters were set to K = 200 and 


7 = 0.5. 

For example, a complete run with Nmc = 50000 iterations with n = 10 and N = 301 
required about 43 seconds with the serial implementation of the algorithm. In the case 
of our toy system the burn-in phase is extremely short and can be safely ignored. Under 
these conditions the algorithm can be parallelized in a straightforward manner simply by 
breaking up the Markov chain into several smaller independent chains. The execution time 
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with a fixed-size problem scales in a reasonably linear way with the number of processes 
(strong scaling). Our example could be therefore run in only about 3 seconds using 16 
processors. An alternative strategy, suitable for long time-series, would be to parallelize 
the updates of the polymer beads in each step of a single MC chain. 

6 Conclusions 

We presented a novel, extremely efficient and versatile approach for data-based SDE pa¬ 
rameter estimation. Our algorithm obtains its strength from translating the problem of 
generating posterior parameter samples into the problem of simulating the dynamics of 
a statistical mechanics system; the main novelty in our algorithm is the exploitation of 
the fact that this dynamics generically happens on very different time scales. Further¬ 
more, at least for ID systems, our approach also allows for an analytical, and therefore 
computationally efficient, integration of the fastest part of the dynamics. 

In most application cases, our choice of a fixed diagonal mass matrix, for reasonable 
choices of masses - heavier for the measurement beads, lighter for the discretization beads - 
can be expected to work well. Nonetheless, if the curvature of the potential varies strongly, 
it might be beneficial to adapt the mass matrix to the local curvature as suggested in [12] . 
For such cases, a combination of the scale separation method proposed in this paper 
with the local mass matrix adaptation of m might be the most efficient solution. This 
extension, however, comes at the price of a computational overhead (second derivatives 
have to be calculated and implicit equations have to be solved), and we will no longer be 
able to solve part of the dynamics analytically. On a more general level, given the wide 
field of very distinct applications, optimal parameter inference for SDF models will not 
be provided by one single approach, but will require a set of tools, to make the optimal 
choice from. We expect statistical physics to continue making strong contributions toward 
this aim. 

The structure of our algorithm is well suited to parallelization, which is important in 
particular if we deal with a high number of measurements. The algorithm can easily be 
adapted to other inference problems, such as higher dimensional SDF, and SDF coupled 
to ODF. These adaptations and extensions will, however, be addressed in future works. 
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